Search API Design Evaluation and Latency Budget

Learn the approaches to meet the non-functional requirements and the response time of the search API.

Introduction#

When designing an API, the optimization of one set of parameters may need rarefaction of the other set of parameters due to the tradeoffs between them. The preceding lessons have seen various aspects of modeling a search API. Mainly, we focused on accomplishing various functionalities of the search API. However, in this lesson, we focus on the non-functional aspects of the search API and how we meet them.

Non-functional requirements#

The non-functional requirements are discussed below.

Availability#

The availability of the search API is enhanced by utilizing rate limiting and API monitoring techniques that prevent our API and the back-end servers from choking. Similarly, to avoid cascaded failure in the internal services, we employ circuit breakers at various points that not only help in the availability of our API but also aid in its reliability.

Scalability#

The scalability of our API is increased by having redundant servers at the backend. So whenever one is down, the other would be on standby to handle the search queries. We also cache results to frequently searched queries. In addition to that, we make use of caching technologies between the client and our services to deliver static content. This reduces the burden on our servers, and consequently, we are able to handle a large number of queries.

Note: For more details on building scalable systems see the Grokking Modern System Design Interview for Engineers & Managers course.

Security #

We support TLS 1.2 and its newer versions to provide a secure communication channel for our APIs to exchange data between client and server. The security in search API can be provided in two ways:

  • A user without login: Since search is a public service, it’s possible to authenticate the requesting application (client) using the API key only.

  • A user with login: To provide a tailored response to users, it’s possible for end users to authenticate themselves using user credentials like username and password. Other than that, JWTs can also be used to obtain a personalized experience from the search service.

Low latency#

In order to reduce the latency of our search API, we have opted for a number of techniques. For instance, we utilize high-speed caches in the API gateway to keep the frequently searched queries that are generic and whose data is not updated instantly. Similarly, on the server side, we set a maximum threshold on time to generate results for each search query. If the search query takes more time than the threshold, the execution is halted, and the results found within the time limit are returned to the user. Furthermore, we employ pagination techniques, which reduce the network latency while fetching results in the form of a number of pages instead of retrieving all the searched results at once, which may exceed hundreds of pages. Also, performing the filtering before passing the results to the search server reduces the overall latency, as explained in the previous lesson.

Point to Ponder

Question

If we set a time limit on searching a query on the server side, wouldn’t it affect the accuracy of the search results?

Hide Answer

There is a tradeoff between the two measures: latency and accuracy. However, the occurrence of queries exceeding the time limit is very rare. Most queries are processed within the time limit that produces the desired results. On the other hand, we should keep the time limit long enough to have a low impact on accuracy and latency.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches


Availability

  • Implement rate limiting to avoid overloading the API
  • Use API monitoring tools to detect the spike or any unusual actions


Scalability

  • Replicate all the services involved in the search API
  • Use caching solution


Security

  • Use API keys to authenticate the client
  • Use user credentials for authentication to provide the personalized response


Low latency

  • Use high-speed cache in the API gateway
  • Use pagination to avoid the unnecessary payload
  • Set timeout for the queries

Latency budget#

In this section, we estimate the response time of our search API. The response time can vary depending on the message size, cached response, and simple or complex filters in the query. Let’s start with the estimation of the request-response sizes and then calculate the response time.

Note: As discussed in the Back-of-the-Envelope Calculations for Latency chapter, in the case of GET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 milliseconds (ms) per KB.

  • Request size:  Since the GET request has no body, we’ll assume it to be 1.5 KB because of the addition of some query parameters, like query, sort, and filter.

  • Response size: The response size mainly depends on the number of results on a page, that is, the limit parameter in pagination. Assume the response body includes ten search results, five recommendations, and two ads. If each result is 1 KB, and the size of the recommendations is 5 KB, whereas ads are 5 KB, then the total size is equal to:

Sizeresponse=(SizeresultsĂ—Numberresults)+Sizerecommendations+SizeadsSize_{response} = (Size_{results} \times Number_{results}) + Size_{recommendations} + Size_{ads}
Sizeresponse=(1Ă—10)+5+5=20 KBSize_{response} = (1 \times 10) + 5 + 5 = 20\ KB

Response time#

The following calculator calculates the latency and estimates the response time based on the request and response size for the search service.

Response Time Calculator of the Search API

Enter size in KBs20KB
Minimum latencyf198.5ms
Maximum latencyf279.5ms
Minimum response timef202.5ms
Maximum response timef291.5ms

Assuming the response size is 20 KB, then the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4Ă—size of response (KBs)=120.5+70+0.4Ă—20=198.5 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 120.5 + 70 + 0.4 \times 20 = 198.5\ ms

Timelatency_max=Timebase_max+RTTget+0.4Ă—size of response (KBs)=201.5+70+0.4Ă—20=279.5 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 201.5 + 70 + 0.4 \times 20 = 279.5\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for minimum response time, we use minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=198.5 ms+4 ms=202.5 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 198.5\ ms + 4\ ms = 202.5\ ms

For maximum response time, we use maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=279.5 ms+12 ms=291.5 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 279.5\ ms + 12\ ms = 291.5\ ms

Note: We considered the minimum processing time (in the case of a parallel execution of API calls on all services) is 4 ms, and the maximum processing time (assuming services are not executing in parallel) is 12 ms. The details of these estimations are provided in the Back-of-the-Envelope Calculations for Latency chapter.

A summary of the overall response time for the search service is shown in the illustration below.

Latency and processing time of the search API
Latency and processing time of the search API

In this lesson, we have discussed how it’s possible to meet non-functional requirements by incorporating different techniques in our design. We also observed from the calculations above that the designed API has low latency.

Refinements in the Search API

Requirements of the File API